Skip to main content

Personal Knowledge Base

Why most system outage or problems happens at 2am-3am time?

  1. Maintenance & deployments are scheduled at night

Meaning: Most systems are intentionally changed at night to avoid disturbing users. Changes are when things break.

Why 2–3 AM specifically

  • Lowest traffic window (India: ~2–6 AM IST; US/EU overlap also favors this time)
  • Businesses define this as a maintenance window (pre-approved “safe” time to touch prod)

What happens

  • Code deploys
  • Database migrations
  • Config changes
  • Infra upgrades (kernel, Kubernetes nodes, load balancers)

Reality: Most outages are caused by change, not load.

This is a known SRE (Site Reliability Engineering) principle.

  1. Humans are tired → mistakes spike 😴

Meaning: 2–3 AM is peak human cognitive low point.

Scientifically:

  • Reaction time slows
  • Attention drops
  • Risky decisions increase

In India especially:

  • On-call engineers often work day + night on-call
  • Sleep debt accumulates
  • “Just push it, it’ll be fine” mindset

Examples

  • Running a migration on the wrong DB
  • Forgetting WHERE in a SQL update
  • Restarting the wrong service/cluster
  • Copy-pasting prod credentials into staging scripts
  1. Low traffic hides problems until suddenly it doesn’t 📉→📈

Meaning: Problems exist earlier, but they become visible only when traffic pattern shifts.

At night:

  • Background jobs (ETL, cron, backups) run heavily
  • Queues fill silently
  • Latency increases slowly

Then:

  • Morning traffic hits
  • System collapses
  • People think “it failed at 3 AM”, but root cause started earlier
  1. Infra & cloud scheduled tasks run at night ☁️

Cloud providers (AWS, GCP, Azure) often do:

  • Hardware maintenance
  • Network rebalancing
  • Spot instance reclamation
  • Auto-scaling recalculations